-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permit non-consecutive increments in validator set #1137
base: main
Are you sure you want to change the base?
Permit non-consecutive increments in validator set #1137
Conversation
Changes the validator set successions to still be sequential, but non-consecutive. For bridge security against collusion by validators, it's important that these validators are still bonded, but the timing of the signature itself is secondary. As such, even if some validators have rotated out, this change permits keeping the bridge alive so long as 2/3rds of the `currentValidatorSet` are still bonded and sign the commitment. Co-authored-by: bhargavbh <[email protected]> Co-authored-by: Alistair Stewart <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1137 +/- ##
=======================================
Coverage 49.66% 49.66%
=======================================
Files 63 63
Lines 3707 3707
Branches 72 72
=======================================
Hits 1841 1841
Misses 1849 1849
Partials 17 17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks @Lederstrumpf for summarising our discussions and creating this PR. Just to add, for the light client to not necessarily follow every consecutive validatorSet changes, there are 2 requirements:
|
There must also exist a commitment where the overlapped 2/3rd has signed. Would this be the mandatory commitment for the session? Do all validators have to sign an mandatory commitment? |
No, a voting round concludes once the threshold of required signatures is reached: So at least 2/3rds have to sign it for producing a finalized commitment. Hence: I would start off with the conservative implementation below.
It's mandatory in BEEFY to sign the first block of the new session, i.e. where the new authority set kicks in. Although the previous authority set sign to hand over to the new one, there isn't a guarantee that there's a 2/3rd overlap between the two sets. For instance, in principle, validators could all rotate their session keys from one session to the next, so the BEEFY authorities would seem distinct despite the underlying economic identities remaining the same. We could relax requiring the validators to sign every first block of a session to making it mandatory to sign only if any keys have changed since the last session in the voting protocol itself (making it mirror grandpa), but the real benefit is in just relaxing this requirement for the relayer:
|
Co-authored-by: bhargavbh <[email protected]>
snowbridge/contracts/src/BeefyClient.sol Line 370 in 4a90854
Does this line also require change? Assuming there is no beefy authorities change from Session 4 to 6 and we skip these mandatory commitments, after switching the (current, next) pair from (3,4) to (7,8), the next pair will be (4,8) instead of (7,8). |
good observation, and an important point to discuss:
I reckon we don't need to change this line, but the minimal change does have a caveat: TLDR: after the session hop, the next commitment will have to be on the
This is correct. In your example, after the update from a commitment from session 7, the client will deem session 4 to be the current one, but it will have updated the Before seeing a commitment from session 8, the light client would still see the authorities from session 4 as the current ones, but since it will already check that
The caveat of the minimal change (as it is now) is that the light client will ignore any commitments from session 7, unless we remap signatures from session 8 to it (in which case they would also trigger the
If you want to jump without skipping the |
Thanks @Lederstrumpf for the detailed explanation and all makes sense. Just checked the recent 10 history We may improve for the aggressive implementation(with the threshold and remapping of indices stuff) later. |
I cannot reproduce all validators signing the mandatory commitment in our local testnet. It seems validators stop once 2/3rds have signed the mandatory commitment. This can be an issue: 3 steps that the relayer performs:
Recovery depends on:
So potentially the recovery would need to retry until getting a PrevRandao that chooses validators from the VSLKOE. We may need a new call on the light client specific for recovery but uses the same principals as this commit. |
Hi @alistair-singh, BEEFY letting only 2/3rd+1 validators to sign in a round seems problematic for this change. This modification requires some more thought into it. |
I am not so sure of the concerns on light client making multiple attempts to subsample the right set of validators. We first make a mapping between validators from recovery_crafting_seesion and VSLKOE. The light client now would subsample only from bitfields (obtained after the mapping transformation) that are set to 1 in commitPrevRandao. I fail to see how this would be different from any other subsampling situation. If we have the signatures of the overlapping 2/3rd in VSLKOE (which may not be the case due the the issues described above), then subsampling should not be a problem. To conclude, subsampling would only be an issue in conjunction with the issue of "2/3rd current validators in VSLKOE", and is not independently an issue. Is my understanding right? |
So the subsample will fail if it chooses a validator not in VSLKOE. We have to include validators that are not in VSLKOE when calling
Yes. But I would say it's the signing of the commitment not being signed by 2/3rds VSLKOE rather than "2/3rd current validators in VSLKOE". Random sampling that might fail in this context will make it hard to predict the time and cost of recovery. |
thanks for clarifying. Agreed! indeed the subsampling issue stems from the fact that 2/3rd of the VSLKOE validators are not signing the crafted commitment. This in turn is a result of validators not signing a commitment in a BEEFY round once they see 2/3rd signatures of other validators. At an initial glance, the BEEFY rules of moving to the next round once it sees 2/3rd signatures is for efficiency reasons, and is not necessarily needed for safety/liveness. One approach could be relaxing the rule but there are several trade-offs between the efficiency of BEEFY and redundant signatures that are useful only incase of Bridge recovery. We also have to take into consideration how complex the changes are on BEEFY side. We will put some more thought into it and get back. |
We had a discussion internally and this is our opinion. The BEEFY rule that makes validators skip signing a commitment if they have already seen 2/3rd other signatures can actually be relaxed and seems like a good idea specially for mandatory blocks.
It is probably an incremental improvement and does not completely solve the recovery problem. Future Plans: |
Thanks @bhargavbh, I will discuss with the team going forward with this merge. But I think the main blocker for me is that we need some unit tests to test this scenario. I will probably have to add these tests as I know how to generate data from our testnet. |
Some data to support our opinion here: The conservative change proposed above performs no remapping and sends a commitment whenever the auth set's keys effectively change (note BEEFY auth set changes on every session via paritytech/substrate#11269, irrespective of key changes). Historic stats: Kusama
Polkadot
So even with only the conservative change for session skipping, the expected operational cost for relaying commitments for altruistic relayers* is less than a quarter of the current cost. *: i.e. the ones relaying commitments only for keeping the bridge alive - if more demand warrants more frequent commitment updates: great! |
If you send me / push some additional finality proofs from your testnet à la https://github.com/Snowfork/snowbridge/tree/a223f97a2c21f14a83aa9786644f9e81f94c1fce/contracts/test/data/beefy-final-proof-0.json, I'm happy to add the unit tests I had in mind wrt. the double hop behavior.
E.g. this should be tested. |
Hey @Lederstrumpf! I am closing this PR for now, since it looks like we won't address this until launch. Happy to reopen once we have decided to make this improvement. |
Oops sorry @Lederstrumpf @bhargavbh we closed this too hastily. I am keen on exploring this so that we can add it post launch. It seems we are going to actually launch within the next few weeks. This means that for better or worse, we now have a soft code freeze. The gas savings introduced here are compelling. I'm just a bit worried about the complexity arising in this thread #1137 (comment) (remapping of validator sets). After adding your improvements, the existing |
So here is the documentation for re-generating the test data.
But it’s not useful on its own. It needs to be modified to support this case. The process goes like this:
Now the above 3 steps generate data for a single SubmitInitial/SubmitFinal cycle built from the same commitment and validator set from step 1 above. There are two sets of data for two cases which we have specialized files for.
The two cases above have two tests each:
The test case you will need to test will need to test multiple SubmitInitial/SubmitFinal cycles through the light client. There will be 3 cycles of SubmitInitial/SubmitFinal.
Within those 3 cycles of SubmitInitial/SubmitFinal we would probably want to test the following cases which will also need to be generated.
|
I'd like to see some on-chain data rough analysis supporting this will make any difference in practice regardless of approach - the bridge is getting good traffic and this will only increase, I expect there will be at least one P->E message every 4 hours anyway, so at least one finality relay happening because of demand anyway. The optimization is worth it if data shows inactivity periods where this actually saves money. |
Changes the validator set successions to still be strictly increasing, but not necessarily consecutive.
This can greatly improve cost-efficiency of the bridge, but is also useful in recovery scenarios like paritytech/polkadot-sdk#3080, where only one signed commitment would have to be crafted for a recovery, instead of one for every missed session.
Rationale
For bridge security against collusion by validators, it's important that these validators are still bonded, but the timing of the signature itself is secondary. Only bonded validators are still slashable, and to a bonded validator, neither the probability of succeeding with an attack nor the probability of getting caught are influenced by the timing of the signature - it is in fact irrelevant whether they sign a payload while being part of a particular validator set
n
or if they happened to be part of setn+m
later (as long as they are still bonded).While it is important that BEEFY produces a commitment for every handover of validator sets, it would be beneficial to not relay a block every session unless the active identities have changed.
For the BEEFY light client, while it will naturally still check a signed commitment against the authorities defined in the
currentValidatorSet
(read: validator set last known on Ethereum [VSLKOE]), as long as the signatures are still ordered as expected for the bitfield checks, the commitment can pass even if the commitment was signed outside of the VSLKOE's mandate within the BEEFY voting protocol.While the validator set does change over time, it is largely preserved from session to session.
As such, even if some validators have rotated out, this change permits keeping the bridge alive so long as 2/3rds of the
currentValidatorSet
are still bonded and sign the commitment - either for recovery or cost optimization purposes.Next Steps
This change is a prerequisite to allowing relayers to skip sessions for block relaying:
https://github.com/Snowfork/snowbridge/blob/main/relayer/relays/beefy/polkadot-listener.go#L115-L125
For a cost-efficient bridge, the next step is to adjust the (common-good) relayer logic to only relay blocks if there has been a change in the keys of the upcoming validator set.
More aggressive approaches that additionally don't relay if only small quantities of keys have changed are possible too, but also require a more bespoke implementation.
cc @AlistairStewart @bhargavbh